Semantic Specialisation of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints

نویسندگان

  • Nikola Mrksic
  • Ivan Vulic
  • Diarmuid Ó Séaghdha
  • Ira Leviant
  • Roi Reichart
  • Milica Gasic
  • Anna Korhonen
  • Steve J. Young
چکیده

We present ATTRACT-REPEL, an algorithm for improving the semantic quality of word vectors by injecting constraints extracted from lexical resources. ATTRACT-REPEL facilitates the use of constraints from monoand crosslingual resources, yielding semantically specialised cross-lingual vector spaces. Our evaluation shows that the method can make use of existing cross-lingual lexicons to construct highquality vector spaces for a plethora of different languages, facilitating semantic transfer from highto lower-resource ones. The effectiveness of our approach is demonstrated with state-ofthe-art results on semantic similarity datasets in six languages. We next show that ATTRACTREPEL-specialised vectors boost performance in the downstream task of dialogue state tracking (DST) across multiple languages. Finally, we show that cross-lingual vector spaces produced by our algorithm facilitate the training of multilingual DST models, which brings further performance improvements.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic Specialization of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints

We present ATTRACT-REPEL, an algorithm for improving the semantic quality of word vectors by injecting constraints extracted from lexical resources. ATTRACT-REPEL facilitates the use of constraints from monoand crosslingual resources, yielding semantically specialized cross-lingual vector spaces. Our evaluation shows that the method can make use of existing cross-lingual lexicons to construct h...

متن کامل

Cross-Lingual Distributional Profiles of Concepts for Measuring Semantic Distance

We present the idea of estimating semantic distance in one, possibly resource-poor, language using a knowledge source in another, possibly resource-rich, language. We do so by creating cross-lingual distributional profiles of concepts, using a bilingual lexicon and a bootstrapping algorithm, but without the use of any sense-annotated data or word-aligned corpora. The cross-lingual measures of s...

متن کامل

Learning Cross-lingual Word Embeddings via Matrix Co-factorization

A joint-space model for cross-lingual distributed representations generalizes language-invariant semantic features. In this paper, we present a matrix cofactorization framework for learning cross-lingual word embeddings. We explicitly define monolingual training objectives in the form of matrix decomposition, and induce cross-lingual constraints for simultaneously factorizing monolingual matric...

متن کامل

Creating bilingual lexica using reference wordlists for alignment of monolingual semantic vector spaces

This paper proposes a novel method for automatically acquiring multilingual lexica from non-parallel data and reports some initial experiments to prove the viability of the approach. Using established techniques for building mono-lingual vector spaces two independent semantic vector spaces are built from textual data. These vector spaces are related to each other using a small reference word li...

متن کامل

Wiktionary-Based Word Embeddings

Vectorial representations of words have grown remarkably popular in natural language processing and machine translation. The recent surge in deep learning-inspired methods for producing distributed representations has been widely noted even outside these fields. Existing representations are typically trained on large monolingual corpora using context-based prediction models. In this paper, we p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1706.00374  شماره 

صفحات  -

تاریخ انتشار 2017